Overview

Dataset statistics

Number of variables18
Number of observations6703
Missing cells11207
Missing cells (%)9.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory942.7 KiB
Average record size in memory144.0 B

Variable types

Categorical8
Numeric10

Warnings

mifid_money_other_brokers is highly correlated with mifid_invested_other_brokersHigh correlation
mifid_invested_other_brokers is highly correlated with mifid_money_other_brokersHigh correlation
finish_mifid_days has 2890 (43.1%) missing values Missing
first_deposit_days has 4719 (70.4%) missing values Missing
first_trade_investor_account_demo_days has 3595 (53.6%) missing values Missing
start_mifid_days has 4831 (72.1%) zeros Zeros
finish_mifid_days has 800 (11.9%) zeros Zeros
first_deposit_days has 89 (1.3%) zeros Zeros
first_deposit_amount has 4719 (70.4%) zeros Zeros
first_deposit_platform has 728 (10.9%) zeros Zeros
mifid_actual_savings has 649 (9.7%) zeros Zeros
mifid_next_year_savings has 649 (9.7%) zeros Zeros
mifid_invested_other_brokers has 3587 (53.5%) zeros Zeros
first_trade_investor_account_demo_days has 1804 (26.9%) zeros Zeros

Reproduction

Analysis started2021-05-31 14:10:19.713938
Analysis finished2021-05-31 14:10:52.623421
Duration32.91 seconds
Software versionpandas-profiling v2.13.0
Download configurationconfig.yaml

Variables

user_currency
Categorical

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size52.5 KiB
USD
3270 
EUR
3114 
GBP
 
317
NO_CURRENCY
 
2

Length

Max length11
Median length3
Mean length3.002386991
Min length3

Characters and Unicode

Total characters20125
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEUR
2nd rowUSD
3rd rowEUR
4th rowEUR
5th rowEUR
ValueCountFrequency (%)
USD3270
48.8%
EUR3114
46.5%
GBP317
 
4.7%
NO_CURRENCY2
 
< 0.1%
2021-05-31T16:10:53.199426image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-31T16:10:53.425458image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
usd3270
48.8%
eur3114
46.5%
gbp317
 
4.7%
no_currency2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
U6386
31.7%
S3270
16.2%
D3270
16.2%
R3118
15.5%
E3116
15.5%
G317
 
1.6%
B317
 
1.6%
P317
 
1.6%
N4
 
< 0.1%
C4
 
< 0.1%
Other values (3)6
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter20123
> 99.9%
Connector Punctuation2
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
U6386
31.7%
S3270
16.3%
D3270
16.3%
R3118
15.5%
E3116
15.5%
G317
 
1.6%
B317
 
1.6%
P317
 
1.6%
N4
 
< 0.1%
C4
 
< 0.1%
Other values (2)4
 
< 0.1%
ValueCountFrequency (%)
_2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin20123
> 99.9%
Common2
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
U6386
31.7%
S3270
16.3%
D3270
16.3%
R3118
15.5%
E3116
15.5%
G317
 
1.6%
B317
 
1.6%
P317
 
1.6%
N4
 
< 0.1%
C4
 
< 0.1%
Other values (2)4
 
< 0.1%
ValueCountFrequency (%)
_2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII20125
100.0%

Most frequent character per block

ValueCountFrequency (%)
U6386
31.7%
S3270
16.2%
D3270
16.2%
R3118
15.5%
E3116
15.5%
G317
 
1.6%
B317
 
1.6%
P317
 
1.6%
N4
 
< 0.1%
C4
 
< 0.1%
Other values (3)6
 
< 0.1%

user_country
Real number (ℝ≥0)

Distinct122
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47.52290019
Minimum0
Maximum121
Zeros11
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size52.5 KiB
2021-05-31T16:10:53.731462image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile8
Q131
median36
Q366.5
95-th percentile114
Maximum121
Range121
Interquartile range (IQR)35.5

Descriptive statistics

Standard deviation29.97011402
Coefficient of variation (CV)0.6306457286
Kurtosis0.1365035255
Mean47.52290019
Median Absolute Deviation (MAD)11
Skewness1.040899302
Sum318546
Variance898.2077342
MonotonicityNot monotonic
2021-05-31T16:10:54.339467image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
362201
32.8%
41368
 
5.5%
77354
 
5.3%
25329
 
4.9%
6270
 
4.0%
38219
 
3.3%
85209
 
3.1%
120201
 
3.0%
22196
 
2.9%
24183
 
2.7%
Other values (112)2173
32.4%
ValueCountFrequency (%)
011
 
0.2%
135
0.5%
21
 
< 0.1%
31
 
< 0.1%
42
 
< 0.1%
ValueCountFrequency (%)
12163
 
0.9%
120201
3.0%
1192
 
< 0.1%
1181
 
< 0.1%
11739
 
0.6%

start_mifid_days
Real number (ℝ≥0)

ZEROS

Distinct354
Distinct (%)5.3%
Missing3
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean20.49328358
Minimum0
Maximum1090
Zeros4831
Zeros (%)72.1%
Negative0
Negative (%)0.0%
Memory size52.5 KiB
2021-05-31T16:10:54.783498image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile116
Maximum1090
Range1090
Interquartile range (IQR)1

Descriptive statistics

Standard deviation83.03379577
Coefficient of variation (CV)4.05175654
Kurtosis47.55451125
Mean20.49328358
Median Absolute Deviation (MAD)0
Skewness6.274375994
Sum137305
Variance6894.61124
MonotonicityNot monotonic
2021-05-31T16:10:55.249462image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04831
72.1%
1338
 
5.0%
2140
 
2.1%
399
 
1.5%
463
 
0.9%
561
 
0.9%
757
 
0.9%
645
 
0.7%
838
 
0.6%
1131
 
0.5%
Other values (344)997
 
14.9%
ValueCountFrequency (%)
04831
72.1%
1338
 
5.0%
2140
 
2.1%
399
 
1.5%
463
 
0.9%
ValueCountFrequency (%)
10902
< 0.1%
10381
< 0.1%
9671
< 0.1%
8821
< 0.1%
8561
< 0.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.5 KiB
1
3811 
0
2892 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6703
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row0
ValueCountFrequency (%)
13811
56.9%
02892
43.1%
2021-05-31T16:10:55.893518image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-31T16:10:56.067465image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
13811
56.9%
02892
43.1%

Most occurring characters

ValueCountFrequency (%)
13811
56.9%
02892
43.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6703
100.0%

Most frequent character per category

ValueCountFrequency (%)
13811
56.9%
02892
43.1%

Most occurring scripts

ValueCountFrequency (%)
Common6703
100.0%

Most frequent character per script

ValueCountFrequency (%)
13811
56.9%
02892
43.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII6703
100.0%

Most frequent character per block

ValueCountFrequency (%)
13811
56.9%
02892
43.1%

finish_mifid_days
Real number (ℝ≥0)

MISSING
ZEROS

Distinct350
Distinct (%)9.2%
Missing2890
Missing (%)43.1%
Infinite0
Infinite (%)0.0%
Mean36.05691057
Minimum0
Maximum1090
Zeros800
Zeros (%)11.9%
Negative0
Negative (%)0.0%
Memory size52.5 KiB
2021-05-31T16:10:56.305465image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q313
95-th percentile221
Maximum1090
Range1090
Interquartile range (IQR)12

Descriptive statistics

Standard deviation105.7544829
Coefficient of variation (CV)2.932987915
Kurtosis26.9007546
Mean36.05691057
Median Absolute Deviation (MAD)2
Skewness4.765347049
Sum137485
Variance11184.01066
MonotonicityNot monotonic
2021-05-31T16:10:56.657465image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1834
 
12.4%
0800
 
11.9%
2398
 
5.9%
3233
 
3.5%
4135
 
2.0%
5101
 
1.5%
675
 
1.1%
765
 
1.0%
862
 
0.9%
1052
 
0.8%
Other values (340)1058
 
15.8%
(Missing)2890
43.1%
ValueCountFrequency (%)
0800
11.9%
1834
12.4%
2398
5.9%
3233
 
3.5%
4135
 
2.0%
ValueCountFrequency (%)
10901
< 0.1%
10401
< 0.1%
9681
< 0.1%
9361
< 0.1%
8821
< 0.1%

has_deposit
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.5 KiB
0
4719 
1
1984 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6703
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
04719
70.4%
11984
29.6%
2021-05-31T16:10:57.322501image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-31T16:10:57.477163image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
04719
70.4%
11984
29.6%

Most occurring characters

ValueCountFrequency (%)
04719
70.4%
11984
29.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6703
100.0%

Most frequent character per category

ValueCountFrequency (%)
04719
70.4%
11984
29.6%

Most occurring scripts

ValueCountFrequency (%)
Common6703
100.0%

Most frequent character per script

ValueCountFrequency (%)
04719
70.4%
11984
29.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII6703
100.0%

Most frequent character per block

ValueCountFrequency (%)
04719
70.4%
11984
29.6%

first_deposit_days
Real number (ℝ≥0)

MISSING
ZEROS

Distinct312
Distinct (%)15.7%
Missing4719
Missing (%)70.4%
Infinite0
Infinite (%)0.0%
Mean59.60685484
Minimum0
Maximum1050
Zeros89
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size52.5 KiB
2021-05-31T16:10:57.675128image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q14
median11
Q347
95-th percentile303.85
Maximum1050
Range1050
Interquartile range (IQR)43

Descriptive statistics

Standard deviation126.7542907
Coefficient of variation (CV)2.126505265
Kurtosis17.9245195
Mean59.60685484
Median Absolute Deviation (MAD)10
Skewness3.849901044
Sum118260
Variance16066.6502
MonotonicityNot monotonic
2021-05-31T16:10:58.156129image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1148
 
2.2%
2139
 
2.1%
3102
 
1.5%
598
 
1.5%
494
 
1.4%
089
 
1.3%
680
 
1.2%
868
 
1.0%
765
 
1.0%
1042
 
0.6%
Other values (302)1059
 
15.8%
(Missing)4719
70.4%
ValueCountFrequency (%)
089
1.3%
1148
2.2%
2139
2.1%
3102
1.5%
494
1.4%
ValueCountFrequency (%)
10501
< 0.1%
10421
< 0.1%
10061
< 0.1%
9841
< 0.1%
9561
< 0.1%

first_deposit_amount
Real number (ℝ≥0)

ZEROS

Distinct270
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.853190752
Minimum0
Maximum1000
Zeros4719
Zeros (%)70.4%
Negative0
Negative (%)0.0%
Memory size52.5 KiB
2021-05-31T16:10:58.704650image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31.929161201
95-th percentile19.29161201
Maximum1000
Range1000
Interquartile range (IQR)1.929161201

Descriptive statistics

Standard deviation29.74785421
Coefficient of variation (CV)6.129545638
Kurtosis432.5593816
Mean4.853190752
Median Absolute Deviation (MAD)0
Skewness17.84337111
Sum32530.93761
Variance884.9348298
MonotonicityNot monotonic
2021-05-31T16:10:59.179649image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04719
70.4%
1.929161201564
 
8.4%
3.858322401385
 
5.7%
7.716644803135
 
2.0%
19.29161201116
 
1.7%
38.5832240182
 
1.2%
11.574967271
 
1.1%
2.31499344160
 
0.9%
5.78748360243
 
0.6%
9.64580600426
 
0.4%
Other values (260)502
 
7.5%
ValueCountFrequency (%)
04719
70.4%
0.038583224011
 
< 0.1%
0.10224554361
 
< 0.1%
0.17362450811
 
< 0.1%
0.19291612011
 
< 0.1%
ValueCountFrequency (%)
10001
 
< 0.1%
771.66448032
 
< 0.1%
771.54873061
 
< 0.1%
462.81541791
 
< 0.1%
385.83224017
0.1%

first_deposit_platform
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.557064001
Minimum0
Maximum6
Zeros728
Zeros (%)10.9%
Negative0
Negative (%)0.0%
Memory size52.5 KiB
2021-05-31T16:10:59.569646image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.524490366
Coefficient of variation (CV)0.9790800926
Kurtosis1.272916836
Mean1.557064001
Median Absolute Deviation (MAD)0
Skewness1.638068951
Sum10437
Variance2.324070877
MonotonicityNot monotonic
2021-05-31T16:10:59.799648image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
14719
70.4%
5854
 
12.7%
0728
 
10.9%
3266
 
4.0%
669
 
1.0%
451
 
0.8%
216
 
0.2%
ValueCountFrequency (%)
0728
 
10.9%
14719
70.4%
216
 
0.2%
3266
 
4.0%
451
 
0.8%
ValueCountFrequency (%)
669
 
1.0%
5854
12.7%
451
 
0.8%
3266
 
4.0%
216
 
0.2%

mifid_actual_savings
Real number (ℝ≥0)

ZEROS

Distinct12
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.750410264
Minimum0
Maximum15
Zeros649
Zeros (%)9.7%
Negative0
Negative (%)0.0%
Memory size52.5 KiB
2021-05-31T16:11:00.069269image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q16
median9
Q312
95-th percentile13
Maximum15
Range15
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.04697566
Coefficient of variation (CV)0.4624898191
Kurtosis-0.4249155079
Mean8.750410264
Median Absolute Deviation (MAD)3
Skewness-0.7458739295
Sum58654
Variance16.37801199
MonotonicityNot monotonic
2021-05-31T16:11:00.407232image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
122071
30.9%
131021
15.2%
5747
 
11.1%
0649
 
9.7%
7645
 
9.6%
6639
 
9.5%
8430
 
6.4%
9265
 
4.0%
10130
 
1.9%
1164
 
1.0%
Other values (2)42
 
0.6%
ValueCountFrequency (%)
0649
9.7%
11
 
< 0.1%
5747
11.1%
6639
9.5%
7645
9.6%
ValueCountFrequency (%)
1541
 
0.6%
131021
15.2%
122071
30.9%
1164
 
1.0%
10130
 
1.9%

mifid_next_year_savings
Real number (ℝ≥0)

ZEROS

Distinct12
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.355512457
Minimum0
Maximum15
Zeros649
Zeros (%)9.7%
Negative0
Negative (%)0.0%
Memory size52.5 KiB
2021-05-31T16:11:00.717267image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q16
median8
Q312
95-th percentile13
Maximum15
Range15
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.081419453
Coefficient of variation (CV)0.4884702733
Kurtosis-0.7078882074
Mean8.355512457
Median Absolute Deviation (MAD)4
Skewness-0.4779895924
Sum56007
Variance16.65798475
MonotonicityNot monotonic
2021-05-31T16:11:00.978235image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
121412
21.1%
131264
18.9%
51002
14.9%
6820
12.2%
7714
10.7%
0649
9.7%
8404
 
6.0%
9216
 
3.2%
10109
 
1.6%
1161
 
0.9%
Other values (2)52
 
0.8%
ValueCountFrequency (%)
0649
9.7%
11
 
< 0.1%
51002
14.9%
6820
12.2%
7714
10.7%
ValueCountFrequency (%)
1551
 
0.8%
131264
18.9%
121412
21.1%
1161
 
0.9%
10109
 
1.6%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.5 KiB
0
3550 
1
3153 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6703
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row1
ValueCountFrequency (%)
03550
53.0%
13153
47.0%
2021-05-31T16:11:01.743232image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-31T16:11:01.919231image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
03550
53.0%
13153
47.0%

Most occurring characters

ValueCountFrequency (%)
03550
53.0%
13153
47.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6703
100.0%

Most frequent character per category

ValueCountFrequency (%)
03550
53.0%
13153
47.0%

Most occurring scripts

ValueCountFrequency (%)
Common6703
100.0%

Most frequent character per script

ValueCountFrequency (%)
03550
53.0%
13153
47.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6703
100.0%

Most frequent character per block

ValueCountFrequency (%)
03550
53.0%
13153
47.0%

mifid_experience
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.5 KiB
0
4589 
1
2114 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6703
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row0
ValueCountFrequency (%)
04589
68.5%
12114
31.5%
2021-05-31T16:11:02.360918image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-31T16:11:02.528904image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
04589
68.5%
12114
31.5%

Most occurring characters

ValueCountFrequency (%)
04589
68.5%
12114
31.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6703
100.0%

Most frequent character per category

ValueCountFrequency (%)
04589
68.5%
12114
31.5%

Most occurring scripts

ValueCountFrequency (%)
Common6703
100.0%

Most frequent character per script

ValueCountFrequency (%)
04589
68.5%
12114
31.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII6703
100.0%

Most frequent character per block

ValueCountFrequency (%)
04589
68.5%
12114
31.5%

mifid_money_other_brokers
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.5 KiB
0
3587 
1
3116 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6703
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row0
5th row1
ValueCountFrequency (%)
03587
53.5%
13116
46.5%
2021-05-31T16:11:02.947905image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-31T16:11:03.114877image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
03587
53.5%
13116
46.5%

Most occurring characters

ValueCountFrequency (%)
03587
53.5%
13116
46.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6703
100.0%

Most frequent character per category

ValueCountFrequency (%)
03587
53.5%
13116
46.5%

Most occurring scripts

ValueCountFrequency (%)
Common6703
100.0%

Most frequent character per script

ValueCountFrequency (%)
03587
53.5%
13116
46.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII6703
100.0%

Most frequent character per block

ValueCountFrequency (%)
03587
53.5%
13116
46.5%

mifid_invested_other_brokers
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.656422497
Minimum0
Maximum15
Zeros3587
Zeros (%)53.5%
Negative0
Negative (%)0.0%
Memory size52.5 KiB
2021-05-31T16:11:03.263878image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q312
95-th percentile13
Maximum15
Range15
Interquartile range (IQR)12

Descriptive statistics

Standard deviation5.399632781
Coefficient of variation (CV)1.159609718
Kurtosis-1.521028421
Mean4.656422497
Median Absolute Deviation (MAD)0
Skewness0.4913490494
Sum31212
Variance29.15603417
MonotonicityNot monotonic
2021-05-31T16:11:03.501878image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
03587
53.5%
121358
 
20.3%
13515
 
7.7%
5395
 
5.9%
6307
 
4.6%
7237
 
3.5%
8151
 
2.3%
991
 
1.4%
1028
 
0.4%
1118
 
0.3%
ValueCountFrequency (%)
03587
53.5%
5395
 
5.9%
6307
 
4.6%
7237
 
3.5%
8151
 
2.3%
ValueCountFrequency (%)
1516
 
0.2%
13515
 
7.7%
121358
20.3%
1118
 
0.3%
1028
 
0.4%

user_flow_name
Categorical

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size52.5 KiB
3
3455 
0
3017 
2
 
202
1
 
29

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6703
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row3
3rd row3
4th row3
5th row3
ValueCountFrequency (%)
33455
51.5%
03017
45.0%
2202
 
3.0%
129
 
0.4%
2021-05-31T16:11:04.182623image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-31T16:11:04.352611image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
33455
51.5%
03017
45.0%
2202
 
3.0%
129
 
0.4%

Most occurring characters

ValueCountFrequency (%)
33455
51.5%
03017
45.0%
2202
 
3.0%
129
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6703
100.0%

Most frequent character per category

ValueCountFrequency (%)
33455
51.5%
03017
45.0%
2202
 
3.0%
129
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common6703
100.0%

Most frequent character per script

ValueCountFrequency (%)
33455
51.5%
03017
45.0%
2202
 
3.0%
129
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII6703
100.0%

Most frequent character per block

ValueCountFrequency (%)
33455
51.5%
03017
45.0%
2202
 
3.0%
129
 
0.4%

first_trade_investor_account_demo_days
Real number (ℝ≥0)

MISSING
ZEROS

Distinct206
Distinct (%)6.6%
Missing3595
Missing (%)53.6%
Infinite0
Infinite (%)0.0%
Mean16.84137709
Minimum0
Maximum957
Zeros1804
Zeros (%)26.9%
Negative0
Negative (%)0.0%
Memory size52.5 KiB
2021-05-31T16:11:04.604619image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q33
95-th percentile88.65
Maximum957
Range957
Interquartile range (IQR)3

Descriptive statistics

Standard deviation67.47409902
Coefficient of variation (CV)4.006447849
Kurtosis58.25967699
Mean16.84137709
Median Absolute Deviation (MAD)0
Skewness6.765513322
Sum52343
Variance4552.754039
MonotonicityNot monotonic
2021-05-31T16:11:04.910603image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01804
26.9%
1313
 
4.7%
2202
 
3.0%
3103
 
1.5%
461
 
0.9%
560
 
0.9%
736
 
0.5%
635
 
0.5%
822
 
0.3%
1020
 
0.3%
Other values (196)452
 
6.7%
(Missing)3595
53.6%
ValueCountFrequency (%)
01804
26.9%
1313
 
4.7%
2202
 
3.0%
3103
 
1.5%
461
 
0.9%
ValueCountFrequency (%)
9571
< 0.1%
8981
< 0.1%
8521
< 0.1%
7371
< 0.1%
6911
< 0.1%

conversion
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.5 KiB
0
5076 
1
1627 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6703
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
05076
75.7%
11627
 
24.3%
2021-05-31T16:11:05.471584image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-31T16:11:05.636584image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
05076
75.7%
11627
 
24.3%

Most occurring characters

ValueCountFrequency (%)
05076
75.7%
11627
 
24.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6703
100.0%

Most frequent character per category

ValueCountFrequency (%)
05076
75.7%
11627
 
24.3%

Most occurring scripts

ValueCountFrequency (%)
Common6703
100.0%

Most frequent character per script

ValueCountFrequency (%)
05076
75.7%
11627
 
24.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII6703
100.0%

Most frequent character per block

ValueCountFrequency (%)
05076
75.7%
11627
 
24.3%

Interactions

2021-05-31T16:10:22.523520image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:22.887521image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:23.169549image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:23.466520image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:23.742554image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:24.058523image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:24.366560image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:24.659559image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:24.941559image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:25.235523image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:25.516524image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:25.781526image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:26.051522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:26.335522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:26.625556image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:26.909563image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:27.182556image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:27.473152image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:27.746142image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:28.024111image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:28.323138image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:28.600115image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:28.855111image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:29.137111image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:29.430147image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:29.709144image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:29.984150image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:30.250111image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:30.517138image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:30.765114image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:31.107109image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:31.593112image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:31.867114image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:32.136114image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:32.406775image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:32.681780image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:32.946745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:33.219786image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:33.482745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:33.774746image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:34.068784image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:34.331781image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:34.612749image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:34.961746image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:35.229786image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:35.487784image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:35.772745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:36.209749image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:36.546750image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:36.814745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:37.096749image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:37.377431image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:37.662395image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:37.989395image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:38.312428image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:38.600393image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:38.890393image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:39.186431image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:39.476394image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:39.797426image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:40.196426image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:40.517441image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:40.844395image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:41.156425image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:41.455391image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:41.736391image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:42.024392image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:42.316421image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:42.711960image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:42.999004image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:43.287999image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:43.593996image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:44.217960image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:44.613960image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:44.909016image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:45.200052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:45.479015image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:45.767062image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:46.052048image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:46.355052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:46.688016image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:47.004018image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:47.295051image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:47.562652image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:47.838849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:48.144812image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:48.420812image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:48.717844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:49.166843image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-31T16:10:49.593810image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-05-31T16:11:05.868584image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-31T16:11:06.499624image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-31T16:11:07.116585image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-31T16:11:07.784184image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-05-31T16:11:08.446146image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-05-31T16:10:50.390808image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-31T16:10:51.383844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-05-31T16:10:51.916849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-05-31T16:10:52.262843image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

user_currencyuser_countrystart_mifid_dayshas_finished_mifidfinish_mifid_dayshas_depositfirst_deposit_daysfirst_deposit_amountfirst_deposit_platformmifid_actual_savingsmifid_next_year_savingsmifid_qualificationsmifid_experiencemifid_money_other_brokersmifid_invested_other_brokersuser_flow_namefirst_trade_investor_account_demo_daysconversion
0EUR360.00NaN0NaN0.018800180NaN0
1USD770.00NaN0NaN0.011213001123NaN0
2EUR290.010.00NaN0.018811153NaN0
3EUR360.00NaN0NaN0.01121200003NaN0
4EUR360.00NaN0NaN0.017710183NaN0
5EUR360.00NaN0NaN0.010000003NaN0
6EUR360.013.00NaN0.0112121011220.00
7USD250.011.00NaN0.0112130011231.00
8EUR240.010.00NaN0.018600003NaN0
9USD260.013.00NaN0.01121310003NaN0

Last rows

user_currencyuser_countrystart_mifid_dayshas_finished_mifidfinish_mifid_dayshas_depositfirst_deposit_daysfirst_deposit_amountfirst_deposit_platformmifid_actual_savingsmifid_next_year_savingsmifid_qualificationsmifid_experiencemifid_money_other_brokersmifid_invested_other_brokersuser_flow_namefirst_trade_investor_account_demo_daysconversion
6693EUR360.011.013.011.57496756131100026.01
6694EUR580.010.01199.03.8583225131310000NaN1
6695USD91811.01829.011006.01.9291610135101120NaN0
6696EUR580.010.012.01.92916151212100000.01
6697EUR360.00NaN0NaN0.00000010000000NaN0
6698EUR36246.01246.01248.04.62998751212000000.01
6699EUR380.00NaN0NaN0.000000100000000.00
6700EUR580.010.0163.03.858322555110000.01
6701USD20167.01169.01196.01.929161012120011200.00
6702USD840.00NaN0NaN0.00000010000000NaN0